Automatic Construction of Japanese WordNet
نویسندگان
چکیده
Although WordNets have been developed for a number of languages, no attempts to construct a Japanese WordNet have been known to exist. Taking this into account, we launched a project to automatically translate the Princeton WordNet into Japanese by a method of unsupervised word-sense disambiguation using bilingual comparable corpora. The method we propose aligns English word associations with those in Japanese and iteratively calculates a correlation matrix of Japanese translations of an English word versus its associated words. It then determines the Japanese translation for the English word in a synset by calculating scores for translation candidates according to the correlation matrix and the associated words appearing in the gloss appended to the synset. This method is not robust because a gloss only contains a few associated words. To overcome this difficulty, we extended the method so that it retrieves texts by using the gloss as a query and uses the retrieved texts as well as the gloss to calculate scores for translation candidates. A preliminary experiment using Wall Street Journal and Nihon Keizai Shimbun corpora demonstrated that the proposed method is promising for constructing a Japanese WordNet.
منابع مشابه
Automatic Construction of Persian ICT WordNet using Princeton WordNet
WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...
متن کاملJapanese Hyponymy Extraction based on a Term Similarity Graph
Semantic relations between words, such as hyponymy, synonymy and meronymy, have various information access applications (e.g. Web search) and the automatic extraction of such relations from corpora is an important research problem in natural language processing. For the Japanese language, there exist several linguistic resources that contain these relations, such as the Japanese Wordnet, Nihong...
متن کاملAutomatic Construction of Wordnets by Using
WordNet is one of the most valuable lexical resources in the Natural Language Processing community. Unfortunately, the benefits of building a WordNet for the Macedonian language have never been recognized. Due to the time and labor intensive process of manual building of such a lexical resource, we were inspired to develop a method for its automated construction. In this paper, we present a new...
متن کاملAutomatic Persian WordNet Construction
In this paper, an automatic method for Persian WordNet construction based on Prenceton WordNet 2.1 (PWN) is introduced. The proposed approach uses Persian and English corpora as well as a bilingual dictionary in order to make a mapping between PWN synsets and Persian words. Our method calculates a score for each candidate synset of a given Persian word and for each of its translation, it select...
متن کاملAutomatic WordNet Mapping Using Word Sense Disambiguation
This paper presents the automatic construction of a Korean WordNet from pre-existing lexical resources. A set of automatic WSD techniques is described for linking Korean words collected from a bilingual MRD to English WordNet synsets. We will show how individual linking provided by each WSD method is then combined to produce a Korean WordNet for nouns.
متن کامل